03. End-to-End Learning

End-to-End Learning

Traditionally, object detection in images requires a pipeline with many steps, such as locating keypoints, extracting some features, using those features to locate objects in the image, and finally, giving the image a corresponding label. For example, when we learned about the ORB algorithm, we saw that we first need to locate the keypoints in both the training and query images; we then need to construct the feature vectors (ORB descriptors) for both images using their keypoints; we then use a matching function to compare the ORB descriptors of both images and determine if the object in the training image is located in the query image. Finally, we give the query image a corresponding label.

The End-to End approach aims to replace these type of pipelines with a single deep neural network that just takes as input the image and directly outputs the image label. The advantage of this approach is that you avoid taking all the intermediate steps required by the pipeline.

A great example of End-to-End learning was presented by Bojarski and co-workers in 2016 where they trained a single CNN using an End-to-End approach for a self-driving car. Using this approach, they were able to train the CNN to directly output the steering commands from the input frames of just a single front-facing camera. In the traditional approach, we will have to use a pipeline with many steps, such as extracting features that will allow it to detect lane markings and roads, path planning, and steering commands. In contrast, in this End-to-End approach the system is never explicitly trained to detect roads or car lanes. Nevertheless, it learned to drive in many different scenarios, such as unpaved roads, parking lots, highways, and roads with or without lane markings.

Examples like this, show that the End-to-End approach works remarkably well. For this reason, End-to-End learning has become one the most exciting developments in deep learning in the recent years.

Downsides

One of the downsides of End-to-End learning is that you need a lot of data to be able to train your deep neural network. This means that you need very large datasets, consisting of tens of thousands or even hundreds of thousands of training examples, in order for the End-to-End approach to work. In contrast, traditional pipelines can train well on much smaller datasets. So it is important to keep in mind that you should only use End-to-End learning if you have a large enough dataset to train your network.